Topic 6: Limitations of Linear Regression
London School of Economics and Political Science
December 1, 2025
Understanding when OLS breaks down shapes how we interpret every regression
\[\widehat{\text{oil price}}_{t+1} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{inventory}_{t}\]
She omits variables like geopolitical tensions, OPEC decisions, and dollar strength.
Should she worry about omitted variable bias?
The answer depends entirely on her objective
\[y = x'\beta + e\]
\[\mathbb{E}[x \cdot e] = 0\]
\[y = x'\beta + e\]
\[\mathbb{E}[e | x] = 0\]
\[\mathbb{E}[x \cdot e] = 0\]
\[\mathbb{E}[e | x] = 0\]
Exogeneity \(\implies\) Orthogonality, but not the reverse
Omitted variable bias is a causal concept—it has no meaning in pure prediction
| Goal | Requires Causation? | Model Needed |
|---|---|---|
| Predict oil price tomorrow | No | Projection |
| Understand what drives prices | Yes | Regression |
For forecasting, causal identification is irrelevant
An NGO wants to maximise reach for a health campaign. They model:
\[\widehat{\text{reach}}_i = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{social media budget}_i + \hat{\beta}_2 \cdot \text{demographics}_i\]
They omit variables like local trust in institutions, existing health infrastructure, and cultural factors.
Allocate budget to locations where predicted reach is highest.
\[\mathbb{E}[x \cdot e] = 0 \; \checkmark\]
\[\mathbb{E}[e | x] = 0 \; \text{(required)}\]
The same organisation may need both models for different decisions
“Where should we allocate next year’s budget for maximum reach?”
“Does increasing social media budget cause higher reach?”
Always ask: “Am I trying to predict or to understand causation?”
“If we increase training, will productivity rise?”
\[\text{productivity}_i = \beta_0 + \beta_1 \cdot \text{training}_i + e_i\]
\[\hat{\beta}_1 = 0.15 \quad (p < 0.01)\]
“Each hour of training associated with 0.15 unit productivity increase”
\[\mathbb{E}[\hat{\beta}_1] = \underbrace{\beta_1}_{\text{true training effect}} + \underbrace{\gamma_2 \cdot \delta_1}_{\text{firm size effect}}\]
where:
\[\text{Bias} = (+) \times (+) = (+)\]
Our estimate \(\hat{\beta}_1\) overstates the true training effect
What our estimate tells us
Firms with more training have higher productivity.
What it does NOT tell us
Giving more training will increase productivity.
The reality
\[\text{cov}(\text{training}, \text{productivity}) > 0\]
Without controlling for firm size, we cannot distinguish these stories
\[\text{wage}_i = \beta_0 + \beta_1 \cdot \text{beauty}_i + e_i\]
\[e_i = \gamma_1 \cdot \text{wage}_i + \text{other factors}\]
\[\mathbb{E}[e_i | \text{beauty}_i] = \mathbb{E}[\gamma_1 \cdot \text{wage}_i | \text{beauty}_i] \neq 0\]
Beauty correlates with the error because wages affect both
Compare:
\[\Delta\text{wage}_{i,t+1} = \beta_0 + \beta_1 \cdot \text{beauty}_{it} + e_{it}\]
\[\beta_1 = \frac{\text{cov}(\text{beauty}_t, \Delta\text{wage}_{t+1})}{\text{var}(\text{beauty}_t)}\]
Using time as a natural ordering helps establish causality
\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy}] + e_i\]
\[\Delta\text{earnings}_{i,t+1} = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy at } t] + e_{it}\]
The arrow of time provides identification when simultaneity threatens
\[\text{cov}(y_i, y_j) = 0 \text{ for } i \neq j\]
\[\mathbb{E}[e_i | x_i] = 0\]
\[\beta_1\]
\[\hat{\beta}_1 = \frac{\widehat{\text{cov}}(x,y)}{\widehat{\text{var}}(x)}\]
\[\hat{\beta}_1 = 0.073\]
We say “biased estimator”—never “biased parameter” or “biased estimate”
| Assumption | Statement | Ident. | Estim. | Infer. |
|---|---|---|---|---|
| 1 | Linearity: \(y = \beta_0 + \beta_1 x + e\) | ✓ | ✓ | |
| 2 | Random sampling | ✓ | ✓ | |
| 3 | Variation in \(x\): \(\text{var}(x) > 0\) | ✓ | ✓ | |
| 4 | Zero mean: \(\mathbb{E}[e] = 0\) | ✓ | ✓ | |
| 5 | Exogeneity: \(\mathbb{E}[e \mid x] = 0\) | ✓ | ✓ | |
| 6 | Homoskedasticity: \(\text{var}(e \mid x) = \sigma^2\) | ✓ | ||
| 7 | Normality: \(e \sim N(0, \sigma^2)\) | ✓ |
AS1-AS5: Are we estimating something meaningful? AS6-AS7: Is our uncertainty correct?
\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \text{desk number}_i + e_i\]
\[\begin{align*} H_{0}: \beta_1 = 0 \\ H_{1}: \beta_1 < 0 \end{align*}\]
Survey conducted at alumni meeting
\(\mathbb{P}[i\text{ attended meeting} | \text{earnings}_i] \text{ is increasing in earnings}\)
Why this biases our estimate
If \(\beta_1 < 0\) (front seats → higher earnings):
The consequence
\[\hat{\beta}_1 \approx \beta_1 < 0\]
\[\hat{\beta}_1 > \beta_1\]
Bias is positive, toward zero
Random treatment assignment doesn’t help when sample selection depends on the outcome
\[y = y^* + e_0\]
\[x = x^* + e_1\]
These two types have fundamentally different consequences
\[\widehat{\text{wages}}_i = \underset{(0.66)}{3.15} - \underset{(0.33)}{0.71} \cdot \mathbb{1}[\text{beauty}_{i} < \overline{\text{beauty}}]\]
Can we conclude that below-average looking people earn less?
\[\text{plim}(\hat{\beta}_1) = \beta_1 \cdot \underbrace{\frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}}_{\lambda \in (0,1)}\]
Our estimate is an upper bound on the true (negative) effect
Attenuation makes our test conservative
\[\widehat{\log(\text{wages})}_i = \underset{(1.3)}{3.2} + \underset{(0.2)}{0.4} \cdot \text{education}_i\]
\(n = 2{,}000\)
\[y = \beta_0 + \beta_1 x^* + \underbrace{(e + e_0)}_{\upsilon}\]
\[\text{var}(\upsilon) = \text{var}(e) + \text{var}(e_0)\]
Error in \(y\) does NOT bias \(\hat{\beta}\)—only inflates variance
\[\text{plim}(\hat{\beta}_1) = \beta_1 \cdot \frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]
Both types of reduced error strengthen our conclusion
| Error in \(y\) | Error in \(x\) | |
|---|---|---|
| What happens | \(\text{var}(\upsilon) \uparrow\) | Attenuation bias |
| Bias? | No | Yes (toward zero) |
| Efficiency? | Reduced | — |
| Assumption violated | None | AS5 (exogeneity) |
\[y = \beta_0 + \beta_1 x^* + \underbrace{(e + e_0)}_{\upsilon}\]
\[\text{cov}(x^*, \upsilon) = \text{cov}(x^*, e) + \text{cov}(x^*, e_0) = 0 + 0 = 0 \; \checkmark\]
\[\begin{align*} y^{*} &= \beta_0 + \beta_1(x - e_1) + e \\ &= \beta_0 + \beta_1 x + \underbrace{(e - \beta_1 e_1)}_{\upsilon} \end{align*}\]
\[\begin{align*} \text{cov}(x, \upsilon) &= \text{cov}(x^* + e_1, e - \beta_1 e_1) \\ &= -\beta_1 \text{var}(e_1) \neq 0 \; \text{✗} \end{align*}\]
\[\text{plim}(\hat{\beta}_1) = \frac{\text{cov}(x, y)}{\text{var}(x)} = \frac{\beta_1 \text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]
\[\lambda = \frac{\text{var}(x^*)}{\text{var}(x^*) + \text{var}(e_1)}\]
\[|\text{plim}(\hat{\beta}_1)| < |\beta_1|\]
If we reject \(H_0: \beta_1 = 0\):
Attenuation provides interpretable bounds—rejection is strong evidence
| Aspect | Error in \(y\) | Error in \(x\) |
|---|---|---|
| Primary effect | ↑ variance | Bias toward zero |
| Estimator property | Unbiased, inefficient | Biased, but bounded |
| Can trust \(\hat{\beta}\)? | Yes | Direction yes, magnitude no |
| Can trust rejection? | Yes | Yes (conservative) |
Error in \(x\) more serious, but attenuation gives us something useful
\[\text{var}(e_i | x_i) = \sigma^2 \quad \forall i\]
\[\text{var}(e_i | x_i) = \sigma_i^2\]
This doesn’t bias OLS—but it breaks our standard errors
\[\text{profits}_i = \beta_0 + \beta_1 \cdot \log(\text{sales}_i) + e_i\]
where profits measured in millions of dollars.
Larger firms likely have more variable profits:
\[\text{var}(e_i | \text{sales}_i) = \sigma_i^2\]
increasing in sales
None involve error variance
\(\therefore\) OLS still unbiased
All inference machinery
\[\text{var}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (x_i - \bar{x})^2}\]
\[\text{var}(\hat{\beta}_1) = \frac{\sum_{i=1}^n (x_i - \bar{x})^2 \sigma_i^2}{\left[\sum_{i=1}^n (x_i - \bar{x})^2\right]^2}\]
If variance increases with \(|x - \bar{x}|\):
If variance decreases with \(|x - \bar{x}|\):
Using wrong SE can go either direction—we can’t know without checking
\[\widehat{\text{var}}_{\text{robust}}(\hat{\beta}_1) = \frac{\sum_{i=1}^n (x_i - \bar{x})^2 \hat{\varepsilon}_i^2}{\left[\sum_{i=1}^n (x_i - \bar{x})^2\right]^2}\]
What we’ll cover
Why it matters
Understanding which assumption fails guides the solution
\[\begin{align*} y &= y^* + e_0 \\ &= \beta_0 + \beta_1 x^* + e + e_0 \\ &= \beta_0 + \beta_1 x^* + \upsilon \end{align*}\]
where \(\upsilon = e + e_0\)
\[\begin{align*} \text{cov}(x^*, \upsilon) &= \text{cov}(x^*, e + e_0) \\ &= \text{cov}(x^*, e) + \text{cov}(x^*, e_0) \\ &= 0 + 0 = 0 \; \checkmark \end{align*}\]
Exogeneity preserved → OLS unbiased \(\blacksquare\)
\[\begin{align*} y^* &= \beta_0 + \beta_1(x - e_1) + e \\ &= \beta_0 + \beta_1 x - \beta_1 e_1 + e \\ &= \beta_0 + \beta_1 x + \upsilon \end{align*}\]
where \(\upsilon = e - \beta_1 e_1\)
\[\begin{align*} \text{cov}(x, \upsilon) &= \text{cov}(x^* + e_1, e - \beta_1 e_1) \\ &= \text{cov}(x^*, e) - \beta_1\text{cov}(x^*, e_1) + \text{cov}(e_1, e) - \beta_1\text{cov}(e_1, e_1) \\ &= 0 - \beta_1 \cdot 0 + 0 - \beta_1 \text{var}(e_1) \\ &= -\beta_1 \text{var}(e_1) \end{align*}\]
Since \(\text{var}(e_1) > 0\) and typically \(\beta_1 \neq 0\):
\[\text{cov}(x, \upsilon) \neq 0 \; \text{✗}\]
Exogeneity violated → OLS biased \(\blacksquare\)
\[\text{plim}(\hat{\beta}_1) = \frac{\text{cov}(x, y)}{\text{var}(x)}\]
\[\begin{align*} \text{cov}(x, y) &= \text{cov}(x^* + e_1, \beta_0 + \beta_1 x^* + e) \\ &= \beta_1 \text{cov}(x^* + e_1, x^*) \\ &= \beta_1 [\text{var}(x^*) + \text{cov}(e_1, x^*)] \\ &= \beta_1 \text{var}(x^*) \end{align*}\]
\[\text{var}(x) = \text{var}(x^* + e_1) = \text{var}(x^*) + \text{var}(e_1)\]